Introducing the q-Theil index 
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Starting from the idea of Tsallis on non-extensive statistical mechanics and the q-entropy notion, we recall 
the Theil index Th and transform it into the Th q index. Both indices can be used to map onto themselves 
any time series in a non linear way. We develop an application of the Th q to the GDP evolution of 20 rich 
countries in the time interval [1950 - 2003] and search for a proof of globalization of their economies. First we 
calculate the distances between the "new" time series and to their mean, from which such data simple networks 
are constructed. We emphasize that it is useful to, and we do, take into account different time "parameters": (i) 
the moving average time window for the raw time series to calculate the Th q index; (ii) the moving average time 
window for calculating the time series distances; (iii) a correlation time lag. This allows us to deduce optimal 
conditions to measure the features of the network, i.e. the appearance in 1970 of a globalization process in 
the economy of such countries and the present beginning of deviations. The q value hereby used is that which 
measures the overall data distribution and is equal to 1.8125. 
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1. INTRODUCTION 

Since the fundamental work of Boltzmann [jl]] the entropy 
concept has been developed and applied to a range of sub- 
jects going from elementary termodynamics and statistical 
mechanics through quantum physics e.g. J 31 information the- 
[3] up to applications in biology e.g. y J5[0] and economy 
3, |9fl. Recently Tsallis and many others in his path have 
shaken up the usual considerations on the entropy concept, in 
particular within Shannon information theory. 

In fact, complex nonequilibrium systems can be often de- 
scribed by a superstatistics, which result of a superposition 
of two statistics associated with two different time scales 
|0[M[lJ[ll[Il[Ii. The methods of extracting superstatis- 
tics parameters from time series are discussed in [ 15]. In that 
line of thought, a special attention can be paid to the entropy 
of a time series. 

On the other hand, the Theil lfl6ll index is often used in 
economy and finance. It is defined through 



(1) 



where the average (x) is made over the ensemble of points N 
of the population of size N. It looks like the Shannon entropy 
but was invented to consider the event values themselves, in 
particular the income Xj of agent i in a population of N agents, 
rather than their probability of occurrence. One peculiarity 
is that it measures the individual's share of income relative 
to the mean income (jc,-) of the population. With reference to 
information theory, Theil's measure is a difference between its 
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maximum entropy and its present entropy at that time. Thus 
from the Theil index one can look at correlations between data 
sets, distances, hierarchies, and other usual features, through 
various techniques of data analysis, like those resulting after 
network constructions. 



An interesting development is to consider that the x, quan- 
tity in Eq.(Q~|l is time dependent. Thus one can generalize the 
Theil index in order to remap in a nonlinear way a time se- 
ries x(t) into a Th(t), as done in Sect. 2 which recalls con- 
siderations outlined in lfl7ll . Moreover in the spirit of non- 
extensive statistical entropy, following Tsallis considerations, 
it can be imagined to propose the q-Theil index, as done in 
Sect. 2. The first application is here below made to macroe- 
conomy time series, in particular to the GDP of the richest 
countries. Following up on studies of correlations between 
GDPs of rich countries |[ll[ll[llll^lll|2l|2ll23lll,we 
have analyzed web-downloaded data on GDP, used as individ- 
ual wealth signatures of a country economical state ("status"). 
We have calculated the fluctuations of the GDP and looked for 
correlations, and "distances", as reported in Sect. 3. 



Usually, a system is represented by a network, nodes be- 
ing scalar agents, here the countries, while links are weights, 
i.e. here measures of distances between two Th(t) represent- 
ing GDP fluctuation correlations between two countries. In 
order to extract structures from the networks, we have aver- 
aged the time correlations in different windows. This allows 
more robustness in the subsequent networks properties and re- 
veals evolving statistical distances. In line with our previous 
works fni n~8t [l9l I20I1 we have examined three different net- 
work constructions. A discussion on economy globalization 
follows with a conclusion in Sect. 4. It is found that such 
a measure of collective habits does fit the usual expectations 
defined by politicians or economists, i.e. common factors are 
to be searched for. 
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2. THEIL INDEX AND TSALLIS ENTROPY 

The original definition of the Theil index, see Eq.^TJ, allows 
for a peculiar mapping of a 1-D data set, like a time series, i.e. 
The Theil index nonlinearly maps the "original" time series 
A(t) into a new one through 

1 t+T[ f A- A- \ 

Th A (t,Tl) = ^Z 7Tf^ ln 77fH (2) 
T \ £ VW^ro W(rT,)/ 

where the average (A)^^) is made over the ensemble of 
points j in a time window of size 71, placed between t and 
t + Ti: 

1 ;'=* 

Thus the Theil index is calculated for the interval [t,t + T\], 
Applications of the Theil index notion can be found in other 
papers lfl7ll20ll . where the Theil index was applied to measure 
the economy globalization process. 

In order to connect with Tsallis non extensive statistics we 
introduce the q-Theil index 77r£ for a time series A(t) 

nfr,m- 1 - ia " (WWM > 1, > (4, 

* q-l 

in the interval [t,t + T{\, Eq.© corresponds to Eq.© when 
9 ->l. 

In order to compare time series, their distance can be im- 
mediately introduced. Moreover the mean and standard devi- 
ations (std) of an ensemble of such distances can be used in 
further considerations. The distance between two time series 
(here the Theil-mapped time series) is hereby defined as the 
absolute value of the difference between mean values in the 
interval [t,t + T2]. Moreover the elements of the time series 
can be taken at equal times or with the time lag x, a possibility 
which we take also into consideration for generality purposes. 
Thus we define 

d Thq (A,B) m , r2 , x) - I (Th A q (t, r, ) - Th*(t+x, n)) m) I . (5) 

In Eq.© the mean value denoted by brackets, (...), is de- 
fined as in Eq.(0). 

As a result we have three different time parameters: 

1. the T{ time window while calculating the Th q index, 

2. the time lag x, and 

3. the correlation window T2. 

Note: in the analysis both time windows (Th q and correlation) 
are used congruently so the the total size of the time window 
is equal to the sum of the Th q and correlation time windows. 
Therefore the number of the generated networks is equal to 
the time series length minus the total time window size. 



3. MACROECONOMY INDEX INPUT AND NETWORK 
CONSTRUCTION 

3.1. GDP data 

GDP data sets of most rich OECD countries were used, 
i.e. 20 countries Austria, Belgium, Canada, Denmark, Fin- 
land, France, Greece, Ireland, Italy, Japan, the Netherlands, 
Norway, Portugal, Spain, Sweden, Switzerland, Turkey, U.K., 
U.S. A, and Germany, allowing for a linear superposition 
of the data before the reunification in 1991 in the latter 
case; an All country is also invented as in previous works 
dSdlM]. 1 Thus N = 21. The data starts in 1950 and 
finish in 2003, so there are 54 data points in every time series. 



3.2. Networks 

The distance matrices obtained from Eq.Q are analysed 
by constructing three network structures and analysing statis- 
tical properties of the distances between nodes. The following 
networks are considered: unidirectional minimal length path 
(UMLP), bidirectional minimal length path (BMLP) and lo- 
cally minimal spanning tree (LMST). The algorithms gener- 
ating the mentioned networks are: 

UMLP The network begins with an arbitrary chosen country, 
- here the All country, then the closest neighbouring 
country is attached and become the end of the network. 
The next country closest to the end of the network is 
searched and attached. The process continued until all 
countries are attached to the network. 

BMLP The network begins with the pair of countries with 
the smallest distance between them. Then the country 
closest to the ends of the network are searched and those 
with shorter distance attached to the appropriate end. 
The algorithm is continued until all countries become 
nodes of the network. 

LMST The root of the network is the pair of closest neigh- 
bouring countries. Then the country closest to any node 
is searched and attached. The algorithm is continued 
until all countries are attached to the network. 

Notice that in the UMLP construction, All is at the begin- 
ing of the chain, while in the other two constructions, All is 
treated as a "normal" country. The BMLP and LMST network 
seeds are the appropriate pairs of the closest countries accord- 
ing to the appropriate distance matrix. The first two networks 
are linear, and essentially robust against a "perturbation", like 
removing or adding a country or in the case of a regrettable 
mathematical error, since they are based on a measure rela- 
tive to a statistical mean, while the LMST is obvioulsy a tree, 



1 The set deviates somewhat from previous works Q^fSSJl since there is 
neither Iceland nor Luxembourg but there is Turkey in the present paper. 
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rather compact when only 21 data points, thus with very few 
branching levels, are involved in the construction. It is known 
that such a tree is far from robust. 



4. RESULTS 

First let us report that a q value of the considered data set 
must be given. It could be let as a free parameter and one 
could find some optimal value according to some criterion, or 
a few criteria. Here below, i.e. for the GDP time series in 
1950-2003 for the countries defined in Sec B.ll we have calcu- 
lated the q value for the following considerations by the max- 
imum likelihood estimator 112611 . as for Tsallis entropy, and 
found q = 1.8315, hereby used to calculate distances through 
(Eq.© ) and Eq.©. It is fair to recall that Borges in cal- 
culated the (Tsallis entropy) q value for GDP of USA, Brasil, 
Germany and UK. He found a q value varying from 1 .4 (UK) 
up to 2.1 (Brasil), and or USA, q=\.l. 



4.1. q-Theil distance statistics 

In our analysis UMLP, BMLP and LMST networks were 
constructed for all time windows ranging from T\ = 5 yrs , 
Ti = 1 y moving along the time axis by a one year step. Eleven 
time lag values were considered: x G [0,1,..., 10]. The T\, T% 
and x parameters statisfy the inequality T2+T3 +X < 54 yrs, so 
the number of generated networks (Net ) depends on the time 
window sizes and is equal to N^et = 54 — T\ — T2 — x, for a 
given triplet, - times 3, due to the type of network considered. 
In total this is a huge number of networks. Therefore some 
cases are to be extracted for the present report. 2 Different pre- 
sentations can be made, in a three dimensional time coordinate 
space. We propose a vizualisation of the data through a spec- 
trogram method, using for the x and y axis the time window 
T2 and T\ respectively for a given x. The data values are repre- 
sented by a colored pixel in a convenient order. 3 The results of 
calculations of the mean but also values of the corresponding 
standard deviations are here below presented. 

The mean value and standard deviation of the distances be- 
tween nodes as a function of the T\ and T2 are presented in 
Figs. Q]-[9]for the time lag x = 0, 5, 10 yrs. The largest value 
of the mean distance, the minimum mean distance, the max- 
imum and minimum standard deviations as a function of the 
time windows T\, T2 and time lags are presented in Table U 

It can be first generally observed that the mean distance be- 
tween countries and the corresponding standard deviation are 
the biggest for UMLP networks and the smallest for LMST 
networks. It is also worth noticing that the mean distance de- 
pends on the time lag value. If the time lag is large the mean 
distance is large as well. The maximum of the mean distance 



2 All cases are available from the authors upon request. 

3 The results are presented in grey tones, but online figures are available in 
color. 




FIG. 1 : Mean distance between countries as obtained in the case of 
the Thq mapping and the UMLP network construction. The distance 
is averaged over the network links and the time. The values of T\ and 
T2 are given on the vertical and horizontal axis respectively. Time lag 
value: x = y. The color scale in use is indicated. A few numerical 
values are given in Table U 
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FIG. 2: Mean distance between countries deduced from Thq for 
UMLP networks. The distance is averaged over the network links 
and the time. The size of T\ and T2 are presented on the vertical and 
horizontal axis respectively. Time lag x = 5 yrs. The color scale in 
use is indicated. A few numerical values are given in Table U 



occurs for the longest T\ and the shortest T2 windows sizes. 
The minimum mean distance is found with the oposite com- 
bination of the time windows sizes, i.e. small T\ and large T%. 
The standard deviations increase with the time lag and are the 
largest ones in the case of the longest considered time lag. 
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TABLE I: The maximum mean distance, the minimum mean distance, the maximum and minimum standard deviations resulting for each type 
of network, for a few characteristics x values are given when Th q is calculated for q = 1.8315. Recall that the "mean" is that of the distances 
between nodes on the indicated network in the ensemble of networks generated for the given time windows. The values of the averaging 
windows T\ and T 2 when this maximum (minimum) occurs are indicated; the corresponding number of networks (N^fet) use d for the statistics 
is also indicated. 
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FIG. 3: Mean distance between countries deduced from Thq for 
UMLP networks. The distance is averaged over the network links 
and the time. The size of T\ and T 2 are presented on the vertical and 
horizontal axis respectively. Time lag x = 10 yrs. The color scale in 
use is indicated. A few numerical values are given in Table|T] 
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FIG. 5: Mean distance between countries deduced from Thq for 
BMLP networks. The distance is averaged over the network links 
and the time. The size of T\ and T 2 are presented on the vertical and 
horizontal axis respectively. Time lag x = 5 yrs. The color scale in 
use is indicated. A few numerical values are given in Table|T] 
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FIG. 4: Mean distance between countries deduced from Thq for 
BMLP networks. The distance is averaged over the network links 
and the time. The size of T\ and T 2 are presented on the vertical and 
horizontal axis respectively. Time lag x = y. The color scale in use 
is indicated. A few numerical values are given in Table [T] 
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FIG. 6: Mean distance between countries deduced from Thq for 
BMLP networks. The distance is averaged over the network links 
and the time. The size of T\ and T 2 are presented on the vertical and 
horizontal axis respectively. Time lag x = 10 yrs. The color scale in 
use is indicated. A few numerical values are given in Tablejl] 
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T 1= 5 yrs, T 2 = 10 yrs 
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FIG. 7: Mean distance between countries deduced from Thq for 
LMST networks. The distance is averaged over the network links 
and the time. The size of T\ and T 2 are presented on the vertical and 
horizontal axis respectively. Time lag x = y. The color scale in use 
is indicated. A few numerical values are given in Tablejl] 




FIG. 10: Yearly evolution of the mean and standard deviation of the 
links between nodes for the UMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 5 yrs, T 2 = 10 yrs for different 
time lags x. 



"1^=10 yrs, T 2 = 5 yrs 




1965 



1975 



tau=0 yrs 
tau = 2 yrs 
tau=4 yrs 



1985 



1995 



tau=6 yrs 
tau=8 yrs 
tau=10 yrs 



FIG. 1 1 : Yearly evolution of the mean and standard deviation of the 
links between nodes for the UMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 10 yrs, T2 = 5 yrs for different 
time lags x. 



FIG. 8: Mean distance between countries deduced from Thq for 
LMST network. The distance is averaged over the network links 
and the time. The size of T\ and T2 are presented on the vertical and 
horizontal axis respectively. Time lag x = 5 yrs. The color scale in 
use is indicated. A few numerical values are given in Table|I] 




4.2. q-Theil network evolution 

For further discussion the following time window sizes 
were chosen, i.e. (T\ — 5 yrs, T 2 — 10 yrs), (T\ — 10 yrs, 
T 2 = 5 yrs), (7i = 10 yrs, T 2 = 10 yrs), (7i = 15 yrs, T 2 = 15 
yrs), for the three time lags X = 0,5, 10 yrs. The evolutions of 




FIG. 9: Mean distance between countries deduced from Thq for 
LMST networks. The distance is averaged over the network links 
and the time. The size of T\ and T 2 are presented on the vertical and 
horizontal axis respectively. Time lag x = yrs. The color scale in 
use is indicated. A few numerical values are given in Table|I] 
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FIG. 12: Yearly evolution of the mean and standard deviation of the 
links between nodes for the UMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 10 yrs, T 2 = 10 yrs for different 
time lags x. 
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FIG. 13: Yearly evolution of the mean and standard deviation of the 
links between nodes for the UMLP network deduced from Th q anal- 
ysis of GDP countries, when T\ = 15 yrs, Ti = 15 yrs for different 
time lags x. 



FIG. 16: Yearly evolution of the mean and standard deviation of the 
links between nodes for the BMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 10 yrs, Ti = 10 yrs for different 
time lags x. 
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FIG. 14: Yearly evolution of the mean and standard deviation of the 
links between nodes for the BMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 5 yrs, Ti = 10 yrs for different 
time lags x. 



FIG. 17: Yearly evolution of the mean and standard deviation of the 
links between nodes for the BMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 15 yrs, Ti = 15 yrs for different 
time lags x. 



mean distance between countries and the corresponding stan- 
dard deviations for these chosen time windows sizes are pre- 
sented in Figs. [T01I2TI Arrows and straight lines indicate re- 
markable features. 

The general observations to be made at this stage are the 
following 



• In all considered networks (UMLP, BMLP and LMST) 
and for all window sizes three types of evolution can be 
distinguished: increase, decrease and relatively stable 
mean distance between countries. 

• These three types of evolution are better seen for long 
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FIG. 15: Yearly evolution of the mean and standard deviation of the 
links between nodes for the BMLP network deduced from Thq anal- 
ysis of GDP countries, when T\ = 10 yrs, Ti = 5 yrs for different 
time lags x. 



FIG. 18: Yearly evolution of the mean and standard deviation of the 
links between nodes for the LMST network deduced from Thq anal- 
ysis of GDP countries, when T\ = 5 yrs, Ti = 10 yrs for different 
time lags x. 
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FIG. 19: Yearly evolution of the mean and standard deviation of the 
links between nodes for the LMST network deduced from Thq anal- 
ysis of GDP countries, when T\ = 10 yrs, Ti = 5 yrs for different 
time lags x. 
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FIG. 20: Yearly evolution of the mean and standard deviation of the 
links between nodes for the LMST network deduced from Thq anal- 
ysis of GDP countries, when T\ = 10 yrs, T2 = 10 yrs for different 
time lags x. 



lag time (x > 5 yrs). Therefore the lag time seems to 
be crucial in any analysis and discussion of the global- 
ization process. This might suggest that some countries 
play a role of leaders while other follow their way. 

• It is worth noticing that for the very long lag time X = 
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FIG. 21: Yearly evolution of the mean and standard deviation of the 
links between nodes for the LMST network deduced from Thq anal- 
ysis of GDP countries, when T\ = 15 yrs, T2 = 15 yrs for different 
time lags x. 



10 yrs and time windows [(Ti — 5 yrs, T2 — 10 yrs), 
(Ti = 10 yrs, T 2 = 5yrs), {T x = lOyrs, T = 10yrs)] the 
maximum of the mean distance occurs about 1960, and 

• since then the size of the network(s) is fast decreasing 
over a decade up to 1970 and 

• thereafter remains small and relatively stable up to 2000 
or so 

• when the mean size seems to reincrease. 



5. CONCLUSIONS 

In conclusion, the most interesting results of this analysis 
are 

• The analysis shows the existence of a globalization pro- 
cess since 1960 till 1970 and its stabilisation thereafter, 
followed by a destabilisation after 2000 as observed in 
the decrease of the network size. 

• The observation of the globalization process does not 
depend on the type of network constructed. 

• The mean distance between countries and the corre- 
sponding std are the largest for the UMLP networks and 
the smallest for the corresponding LMST networks. 

• With increasing time lag the Theil mapping window 
size T\ at which the maximum of the network size is 
found is always decreasing. 

• The globalization process is better seen if the lag time 
is greater than 5 yrs, - which might be considered as the 
time needed for some synchronization process, but is 
also in fact commensurate with most government life 
times and election time intervals. These conjectures 
suggest further investigations. 

• Even though for large time lags the mean values are 
large, the globalization evolution is the same as for short 
time lags (greater than 5 yrs); thus a large time lag mag- 
nifies the globalization process feature which is easilier 
to observe then. 

Of course much more work is in order to connect the above 
to some non extensive thermostatistics ideas. To search for a 
robust ("optimal") q value and the significance of the q-Theil 
index are open questions. Finally let us stress the interest 
of studying graphs, in particular to derive weighted networks 
such as in this paper, in order to have some comparative data 
organisation coherence. 
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